Skip to content

fix(appkit): backpressure inline-Arrow stash before EXTERNAL_LINKS fallback#390

Open
jamesbroadhead wants to merge 1 commit into
stack/arrow-3-inline-arrow-fixfrom
stack/arrow-4-stash-backpressure
Open

fix(appkit): backpressure inline-Arrow stash before EXTERNAL_LINKS fallback#390
jamesbroadhead wants to merge 1 commit into
stack/arrow-3-inline-arrow-fixfrom
stack/arrow-4-stash-backpressure

Conversation

@jamesbroadhead
Copy link
Copy Markdown
Contributor

Summary

Follow-up to #329. On Reyden-backed warehouses, the analytics plugin's stash-full fallback target (EXTERNAL_LINKS + ARROW_STREAM) does not exist — Reyden returns HTTP 501 NOT_IMPLEMENTED: "ExternalLinks disposition is not yet implemented." — so any stash overflow on a Reyden app would surface that 501 to the caller with no further recovery.

This PR adds brief FIFO backpressure inside the stash so transient capacity pressure is absorbed by waiting for the next /arrow-result drain (typically tens of ms — the stash is drain-on-read) rather than immediately escalating to EXTERNAL_LINKS. The existing EXTERNAL_LINKS fallback is preserved for true sustained overload on warehouses that accept it.

What's the fix

  • InlineArrowStash: new putWaitMs option and async putBlocking(userId, bytes, signal?). On overflow it waits FIFO up to putWaitMs for a take() / TTL-gc to free a slot, then retries. Honors AbortSignal (aborted request drops out of the queue immediately). When the wait elapses without capacity, returns null and the caller falls back as before.
  • AnalyticsPlugin: stash now constructed with putWaitMs: 500; the call site awaits putBlocking and passes the request's abort signal.
  • Wait-queue is strictly FIFO. wakeWaiters() walks from the head and shifts each fitting waiter off before invoking it, so a re-entrant gc() inside the waker can't pick up the same waiter twice.

Surface area is small: the synchronous put() is unchanged, the EXTERNAL_LINKS fallback path is unchanged, and putWaitMs = 0 (the default) reduces putBlocking to the synchronous behavior.

Why not the alternatives

  • Probe warehouse capability up-front, skip EXTERNAL_LINKS on Reyden: more state, more code, doesn't help if a different warehouse changes behavior later.
  • Catch the 501 and surface a clearer error: improves diagnostics, but the request still fails — backpressure prevents the failure.
  • Drop the stash entirely and content-negotiate Arrow bytes inline in the HTTP response: structurally cleaner, but a wire-protocol shift better done as a separate PR.

Tests

inline-arrow-stash.test.ts (+8 cases):

  • putBlocking succeeds immediately when capacity is available
  • waits for a take() to free a slot, then succeeds
  • returns null when putWaitMs elapses without a slot freeing
  • preserves FIFO order across waiters
  • rejects later waiters when the head consumes the freed capacity
  • settles null when AbortSignal aborts mid-wait
  • pre-aborted signal short-circuits (no wait)
  • putWaitMs=0 (default) behaves like the synchronous put()

analytics.test.ts: existing "stash full → EXTERNAL_LINKS" integration test now stubs putBlocking directly instead of put, so the suite doesn't bake the 500ms wait into the run time.

Full suite: 2,682 tests, all green (was 2,674 on the base).

Stack

Targets stack/arrow-3-inline-arrow-fix; will rebase onto main after #329 merges.

Test plan

Verified live against e2-dogfood Test Reyden (warehouse_type=REYDEN, id 0000000002e475d4):

  • JSON_ARRAY caller: single SSE result event with row data, no fallback (unchanged)
  • ARROW_STREAM caller: single SSE arrow event → GET /arrow-result/inline-<uuid> returns 776 bytes of valid Arrow IPC, application/vnd.apache.arrow.stream (unchanged — new code only activates under capacity pressure)

The new code path (backpressure under contention) is covered by the unit tests; reproducing it live requires saturating the stash, which is outside the scope of this PR.

This pull request was AI-assisted by Isaac.

The ARROW_STREAM path in the analytics plugin stashes inline Arrow IPC
bytes server-side and emits a synthetic `inline-<uuid>` id over SSE; the
client GETs `/arrow-result/<id>` to drain the stash. When the stash is
full, the plugin falls back to `EXTERNAL_LINKS + ARROW_STREAM`.

That fallback target is not universal. Reyden refuses
`EXTERNAL_LINKS + ARROW_STREAM` outright (HTTP 501 NOT_IMPLEMENTED,
"ExternalLinks disposition is not yet implemented."). On a
Reyden-backed app, any stash overflow would propagate that 501 to the
caller with no further recovery, even though every relevant inline
slot frees within one HTTP round-trip (the stash is drain-on-read).

Add brief FIFO backpressure inside the stash. `putBlocking()` first
tries the existing synchronous `put()` and, on overflow, waits up to
`putWaitMs` for an in-flight `take()` (or TTL gc) to free a slot, then
retries. The analytics plugin constructs the stash with
`putWaitMs: 500` and `await`s the new method at the existing call
site. Honors `AbortSignal` so a cancelled request gives up the wait
immediately. The EXTERNAL_LINKS fallback path is preserved for true
sustained overload on warehouses that accept EXTERNAL_LINKS.

The wait queue is strictly FIFO. After a take() / clear() / TTL gc
frees capacity, wakeWaiters() walks the queue from the head and
shifts each fitting waiter off before invoking it, so a re-entrant
gc() inside the waker cannot pick the same waiter up twice.

Tests cover: immediate-success, wait-then-succeed via take(), timeout
to null, FIFO order across waiters, head-consumes-capacity rejecting
the tail, mid-wait abort, pre-aborted signal, and the putWaitMs=0
default behaving like the synchronous put(). The existing
"stash full -> EXTERNAL_LINKS" integration test now stubs
putBlocking directly so it does not bake the 500ms wait into the
suite.

Verified live against e2-dogfood "Test Reyden" (warehouse_type=REYDEN):
ARROW_STREAM caller returns valid Arrow IPC bytes via the inline path
unchanged; the new code only activates under capacity pressure.

This pull request was AI-assisted by Isaac.

Signed-off-by: James Broadhead <jamesbroadhead@gmail.com>
@jamesbroadhead jamesbroadhead requested a review from a team as a code owner May 19, 2026 16:35
@jamesbroadhead jamesbroadhead requested review from pkosiec and removed request for a team May 19, 2026 16:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant